Search CORE

4 research outputs found

Asynchronous Training of Word Embeddings for Large Text Corpora

Author: Almuhareb A.
Boucher T.
Garten J.
Ghannay S.
Goikoetxea J.
Jurgens D. A.
Levy O.
Li Y.
Luong M.-T.
Mikolov T.
Recht B.
Socher R.
Socher R.
Stergiou S.
Vuurens J. B. P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/12/2018
Field of study

Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is typically sequentially processed and parameters are synchronously updated. Distributed architectures for asynchronous training that have been proposed either focus on scaling vocabulary sizes and dimensionality or suffer from expensive synchronization latencies. In this paper, we propose a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of the embeddings. Our training procedure does not involve any parameter synchronization except a final sub-model merge phase that typically executes in a few minutes. Our distributed training scales seamlessly to large corpus sizes and we get comparable and sometimes even up to 45% performance improvement in a variety of NLP benchmarks using models trained by our distributed procedure which requires

1/10

of the time taken by the baseline approach. Finally we also show that we are robust to missing words in sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201

arXiv.org e-Print Archive

Crossref

"Picture the scene...";

Author: Benevenuto F.
Chakrabarti D.
Clarke C.
Duan Y.
Finin T.
Foo J. J.
Hong J.
Salembier P.
Shamma L. K. D. A.
Sharifi B. P.
Tang Z.
Vuurens A. P. D. V. J.
Weng J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Due to the advent of social media and web 2.0, we are faced with a deluge of information; recently, research efforts have focused on filtering out noisy, irrelevant information items from social media streams and in particular have attempted to automatically identify and summarise events. However, due to the heterogeneous nature of such social media streams, these efforts have not reached fruition. In this paper, we investigate how images can be used as a source for summarising events. Existing approaches have considered only textual summaries which are often poorly written, in a different language and slow to digest. Alternatively, images are "worth 1,000 words" and are able to quickly and easily convey an idea or scene. Since images in social media can also be noisy, irrelevant and repetitive, we propose new techniques for their automatic selection, ranking and presentation. We evaluate our approach on a recently created social media event data set containing 365k tweets and 50 events, for which we extend by collecting 625k related images. By conducting two crowdsourced evaluations, we firstly show how our approach overcomes the problems of automatically collecting relevant and diverse images from noisy microblog data, before highlighting the advantages of multimedia summarisation over text based approaches

Crossref

Enlighten

Crowdsourcing for information retrieval

Author: Alonso O.
Alonso O.
Eickhoff C.
Elsas J. L.
Emine Yilmaz
Lease M.
Lease M.
Matthew Lease
Schone P.
Schwartz B.
Smucker M.
Stone M.
Su Q.
Tai L.
Tang W.
Vallet D.
Vuurens J.
Wang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Considering human aspects on strategies for designing and managing distributed human computation

Author: A Ghosh
A Kapteyn
A Karsenty
A Kittur
A Kittur
A Kulkarni
A Kulkarni
A Kumar
A Mao
A Mao
A Marcus
A Taylor
A Whitefield
AD Shaw
AH Maslow
AJ Quinn
B Kiepuszewski
B Satzger
C Dorn
CH Lin
CJ Lintott
D Chandler
D Georgakopoulos
D Hovy
D Schall
D Wang
DL Hansen
DW Barowy
DW McMillan
E Law
E Law
EA Locke
EL Deci
F Dietrich
G Little
H Heidari
H Rao
H Zhang
HA Simon
J Bragg
J Cardoso
J Coleman
J Grudin
J Noronha
J Rasmussen
J Reason
J Rogstadius
J Ross
J Rzeszotarski
J Sweller
J Vuurens
J Whitehill
J Witkowski
J Yi
J Yu
J-J Laffont
JG Nicholls
JJ Chen
JJ Gross
JR Edwards
JR Hackman
JT Jacques
K Stanovich
KB Misra
L Ponciano
L Von Ahn
L von Ahn
L Yu
LB Chilton
LB Chilton
LC Irani
LF Cranor
M Kearns
M Salek
M Toomim
M Vukovic
M-C Yuen
M-C Yuen
MS Bernstein
MS Silberman
N Archak
N Ram
N Savage
NM Ball
O Amir
O Scekic
P Dai
P Jalote
P Kinnaird
P Pettit
P Venetis
PG Ipeirotis
PG Ipeirotis
PJ Denning
QVH Nguyen
R Morris
R Parasuraman
RD Alexander
RJ Crouser
RJ Dolan
RM de Araújo
RW Picard
S Dow
S Dustdar
S Khanna
S Kochhar
S-W Huang
S-W Huang
T Malone
TP Waterhouse
U Lee
V Ambati
VS Sheng
W Cirne
W Mason
W Van der Aalst
Y-A Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref